Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): Implement naive join ordering #3616

Merged
merged 11 commits into from
Dec 20, 2024

Conversation

desmondcheongzx
Copy link
Contributor

Implements a naive join orderer that simply takes joins relations arbitrarily (as long as a valid join condition exists).

This is intended as a building block to ensure that our join graphs can correctly reconstruct into logical plans. The PR that will immediately follow this will create an optimization rule that applies naive join ordering. The optimization rule will be hidden behind a config flag, but will allow us to test logical plan reconstruction on all our integration tests.

Copy link

codspeed-hq bot commented Dec 19, 2024

CodSpeed Performance Report

Merging #3616 will not alter performance

Comparing desmondcheongzx:naive-join-order (4e6d79f) with main (b87e0a3)

Summary

✅ 27 untouched benchmarks

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 92.83820% with 27 lines in your changes missing coverage. Please review.

Project coverage is 77.93%. Comparing base (4567601) to head (4e6d79f).
Report is 45 commits behind head on main.

Files with missing lines Patch % Lines
...src/optimization/rules/reorder_joins/join_graph.rs 91.40% 22 Missing ⚠️
.../rules/reorder_joins/naive_left_deep_join_order.rs 95.45% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3616      +/-   ##
==========================================
+ Coverage   77.69%   77.93%   +0.23%     
==========================================
  Files         710      720      +10     
  Lines       86896    88728    +1832     
==========================================
+ Hits        67513    69149    +1636     
- Misses      19383    19579     +196     
Files with missing lines Coverage Δ
src/common/scan-info/src/test/mod.rs 38.04% <100.00%> (+0.68%) ⬆️
src/daft-logical-plan/src/test/mod.rs 100.00% <100.00%> (ø)
src/daft-physical-plan/src/test/mod.rs 100.00% <100.00%> (ø)
.../rules/reorder_joins/naive_left_deep_join_order.rs 95.45% <95.45%> (ø)
...src/optimization/rules/reorder_joins/join_graph.rs 94.75% <91.40%> (-3.23%) ⬇️

... and 111 files with indirect coverage changes

#[derive(Clone, Debug)]
pub(super) enum JoinOrderTree {
Relation(usize), // (id).
Join(Box<JoinOrderTree>, Box<JoinOrderTree>, Vec<usize>), // (subtree, subtree, nodes involved).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the nodes involved, you can implement Iterator to give you the nodes involved.

impl Iterator for JoinOrderTree {
...

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I opted for JoinOrderTree.iter() -> JoinOrderTreeIterator which then implements Iterator by maintaining a stack of Joins/Relations.

@desmondcheongzx desmondcheongzx enabled auto-merge (squash) December 20, 2024 08:19
@desmondcheongzx desmondcheongzx merged commit 5d4db4f into Eventual-Inc:main Dec 20, 2024
39 checks passed
@desmondcheongzx desmondcheongzx deleted the naive-join-order branch December 20, 2024 10:37
desmondcheongzx added a commit that referenced this pull request Jan 6, 2025
Missed a comment to
#3616 (comment):
```
In [src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs](#3616 (comment)):

>  };
 
-#[derive(Debug)]
-struct JoinNode {
+// TODO(desmond): In the future these trees should keep track of current cost estimates.
+#[derive(Clone, Debug)]
+pub(super) enum JoinOrderTree {
+    Relation(usize),                                          // (id).
+    Join(Box<JoinOrderTree>, Box<JoinOrderTree>, Vec<usize>), // (subtree, subtree, nodes involved).
I dont think you need to keep an explicit stack. I think you should be able to something like

std::iter::chain(left.into_iter(), right.into_iter())
```

This PR removes the stack and simply chains the iterators.
desmondcheongzx added a commit that referenced this pull request Jan 8, 2025
Applies the naive left deep join order from #3616 as an optimizer rule.
This optimizer rule is gated behind an environment variable that allows
us to validate the rule on our current workloads.

Currently join reordering results in errors for 50% of TPC-H queries
during join graph building. We'll tackle these in a follow-up PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants